Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Login & authentication rework, part 1: credentials management and URL parsing #3293

Merged
merged 1 commit into from
Sep 3, 2024

Conversation

apostasie
Copy link
Contributor

@apostasie apostasie commented Aug 11, 2024

This is a breakout of #3249 and first part of fixing our credentials management / login issues (see 3249 for details about issues, including #3072 which is the reason a significant rewrite is warranted IMHO).

The focus is this PR is solely:

  • providing a CredentialsStore abstraction that takes care of any magic / legacy identifiers
  • handling all registry url parsing & scheme / url monkeying in one place (RegistryURL)

This does:

  • fix Login to docker.io failed. (expected acArg to be "docker.io", got "registry-1.docker.io") #3245 and a number of other oddities (now enshrined in tests)
  • significantly increase testing (+~1000 lines is just tests), both unit testing for CredentialsStore and RegistryURL parsing, and integration testing, which should hopefully give us a much firmer understanding of what is supposed to happen when
  • reduce code duplication
  • greatly simplify handling of registry urls
  • allows fixing further login issues as a follow-up, on better grounds
  • allows for future implementation of namespaced hosts login

What this does NOT:

@apostasie apostasie force-pushed the dev-login-1 branch 5 times, most recently from d0e699e to 0afef56 Compare August 12, 2024 02:01
@apostasie apostasie changed the title [IGNORE FOR NOW] Login & authentication rework, part 1: credentials management and URL parsing Aug 12, 2024
@apostasie
Copy link
Contributor Author

@fahedouch @AkihiroSuda PTAL at your convenience

)

func safeRandomString(n int) string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to test_registry

@apostasie apostasie marked this pull request as ready for review August 12, 2024 02:34
@apostasie
Copy link
Contributor Author

Conflict is likely trivial. Will rebase after review, in case there are any comments or concerns to address.

@apostasie apostasie force-pushed the dev-login-1 branch 2 times, most recently from d50e9da to 864c5be Compare August 15, 2024 17:08
@AkihiroSuda AkihiroSuda requested a review from a team August 15, 2024 17:09
@AkihiroSuda AkihiroSuda added the area/login authentification/ login label Aug 15, 2024
@apostasie
Copy link
Contributor Author

@AkihiroSuda we have regular network related failures on Ubuntu 20.04 / v1.6.

Symptom being no route to host:

time="2024-08-15T17:33:19Z" level=error msg="failed to call tryLoginWithRegHost" error="failed to call rh.Authorizer.Authorize: failed to fetch oauth token: Post \"[http://10.4.0.1:5000/auth\](http://10.4.0.1:5000/auth/)": dial tcp 10.4.0.1:5000: connect: no route to host" i=0
        time="2024-08-15T17:33:19Z" level=fatal msg="failed to call rh.Authorizer.Authorize: failed to fetch oauth token: Post \"[http://10.4.0.1:5000/auth\](http://10.4.0.1:5000/auth/)": dial tcp 10.4.0.1:5000: connect: no route to host"

We could retry on login for that condition - but I am more interested in fixing the root issue.

Do you think this is something in rootlesskit or bypass4netns?

@AkihiroSuda
Copy link
Member

Do you think this is something in rootlesskit or bypass4netns?

No, this seems rootful
https://github.com/containerd/nerdctl/actions/runs/10407628383/job/28823319574?pr=3293

@apostasie
Copy link
Contributor Author

Do you think this is something in rootlesskit or bypass4netns?

No, this seems rootful https://github.com/containerd/nerdctl/actions/runs/10407628383/job/28823319574?pr=3293

Yes, you are right.

So, I guess when under pressure (starting / stopping many containers), networking is breaking down or is slow to converge?

@AkihiroSuda AkihiroSuda requested a review from a team August 17, 2024 16:21
@apostasie apostasie force-pushed the dev-login-1 branch 3 times, most recently from b0d4293 to 996f7c6 Compare August 20, 2024 05:01
@AkihiroSuda AkihiroSuda requested review from a team and djdongjin August 22, 2024 15:21
Copy link
Member

@djdongjin djdongjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some early comments. Still have some changes left to be reviewed. thanks

cmd/nerdctl/logout.go Outdated Show resolved Hide resolved
cmd/nerdctl/login_linux_test.go Outdated Show resolved Hide resolved
cmd/nerdctl/login_linux_test.go Outdated Show resolved Hide resolved
"github.com/containerd/log"
)

type Credentials = types.AuthConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest change to type AuthConfig = types.AuthConfig. If we want to have a type alias we should keep the name the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.
I guess I do not like AuthConfig as a name, but that's fine, let's keep it the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, do you mind if we keep Credentials?
AuthConfig is really a bad name... If anything, it does evoke https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/system-level_authentication_guide/authconfig-install#authconfig-install instead of what it really is... just credentials...

It also "makes sense" that the CredentialsStore would store Credentials.

pkg/imgutil/dockerconfigresolver/credentialsstore.go Outdated Show resolved Hide resolved
Comment on lines +179 to +174
type isFileStore interface {
IsFileStore() bool
GetFilename() string
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface should be moved to the begining of the file, or at least before the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is private and only used internally in this file - there is no implementation, as it is used solely to cast docker credentials store.

In that context, do you feel it should still be moved up? (no strong opinion on my side)

Comment on lines +40 to +51
var (
ErrUnableToInstantiate = errors.New("unable to instantiate docker credentials store")
ErrUnableToErase = errors.New("unable to erase credentials")
ErrUnableToStore = errors.New("unable to store credentials")
ErrUnableToRetrieve = errors.New("unable to retrieve credentials")
)

// Errors returned by `Parse`
var (
ErrUnparsableURL = errors.New("unparsable registry URL")
ErrUnsupportedScheme = errors.New("unsupported scheme in registry URL")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are they used widely in nerdctl or by downstream? if not I prefer just returning a errors.New(...) directly when needed instead of predefining them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are they used widely in nerdctl or by downstream?

Not yet.

The reason I like them is that they allow downstream code to actually test (errors.Is(err, ErrXYZ)) the error (instead of matching strings, which I find abhorrent...), and allows providing more refined user feedback based on these (or decide to ignore certain errors in certain circumstances).

This of course applies to testing as well (as a nice benefit, where we can actually verify what error is being returned in integration testing without having to duplicate strings again).

LMK if you feel strongly about these and of course open to amending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall motivation for me is that our error-ing is not great right now (see #3302 for example), and that is generally in the codebase.

We tend to just bubble-up system errors that are meaningless or even misleading for the user (#3197) - and at best, makes it hard to pinpoint for us when reading bug reports.

This is obviously a very large endeavor to change this situation overall, and it might take a few iterations to come up with a reasonable set of errors and informative enough wrapping, but I believe it is essential.

@apostasie
Copy link
Contributor Author

Left some early comments. Still have some changes left to be reviewed. thanks

Thanks a lot @djdongjin
Really appreciate the review.
I answered you in the comments - addressed some with the latest push - provided clarification on others - so, let me know in there how you feel, and I will look for further comments from you when you will be done reviewing.

@apostasie
Copy link
Contributor Author

apostasie commented Aug 27, 2024

Failure are unrelated:

  • ComposeUp random fail
  • networking fluke: dial tcp 10.4.0.1:5000: connect: no route to host" i=0
  • docker inability to pull an image (might be for the same reason as ^)

Could we poke the CI to restart these?

Copy link
Member

@djdongjin djdongjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably need more time to review some of the changes, as it's a bit difficut to identify which are logic changes, and which are refactors/code relocation/etc.

Also some suggestions on reducing PR size:

  • Add UTs first in a separate PR or commit if UT itself has a lot of changes. If you're trying to fix something and the UT helps reproduce it, you can add and skip that UT (or just add it in a later PR).
  • Do refactor or move code around in a separate PR: it's difficult to identify a small change if it's mixed with a refactor.
  • You can use tools like https://github.com/modularml/stack-pr to make small PRs and make a big change incrementally.

@apostasie
Copy link
Contributor Author

I probably need more time to review some of the changes, as it's a bit difficut to identify which are logic changes, and which are refactors/code relocation/etc.

That is fine.
I do appreciate my latest PRs are more sizable - and certainly understand reviewing these things is significant work - really grateful for that @djdongjin.

Also some suggestions on reducing PR size:

  • Add UTs first in a separate PR or commit if UT itself has a lot of changes. If you're trying to fix something and the UT helps reproduce it, you can add and skip that UT (or just add it in a later PR).
  • Do refactor or move code around in a separate PR: it's difficult to identify a small change if it's mixed with a refactor.
  • You can use tools like https://github.com/modularml/stack-pr to make small PRs and make a big change incrementally.

Will definitely look into stack-pr - great tips, thanks.

Let me see if I can massage my two other "big-ish" PRs into something more amenable to review.

Meanwhile, we can just stay focused on this one - and also #3362 would be nice to get if you have a minute <- it is a small one! Pinky swear! :-)

@apostasie
Copy link
Contributor Author

apostasie commented Aug 28, 2024

Latest CI fail is the usual TestIPFSComposeUp - this one fails more often than it succeeds.

Unfortunately, I couldn't figure it out so far (#3305) - it is a massive PITA.

Anyone familiar with IPFS around?

@AkihiroSuda
Copy link
Member

IPFS

cc @ktock

@apostasie
Copy link
Contributor Author

@AkihiroSuda could you poke the CI for the flaky IPFS 20.04/1.6 run?

@apostasie
Copy link
Contributor Author

Rebased. Pending green CI.

@apostasie
Copy link
Contributor Author

apostasie commented Sep 3, 2024

Rebased.
Debug statements removed.
Pending CI.

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@apostasie
Copy link
Contributor Author

Thanks

Last push just enable the commented out test (see answer in comment about Lima error)

Copy link
Member

@djdongjin djdongjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@djdongjin djdongjin added this to the v2.0.0 milestone Sep 3, 2024
@djdongjin djdongjin merged commit 30c63cf into containerd:main Sep 3, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/login authentification/ login
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Login to docker.io failed. (expected acArg to be "docker.io", got "registry-1.docker.io")
3 participants